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Abstract 

This paper considers the problem of matrix completion, when some number of the columns are arbitrarily 
corrupted, potentially by a malicious adversary. It is well-known that standard algorithms for matrix completion can 
return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even 
a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an efficient 
algorithm for its solution. Our results show that with a vanishing fraction of observed entries, it is nevertheless 
possible to succeed in performing matrix completion, even when the number of corrupted columns grows. When 
the number of corruptions is as high as a constant fraction of the total number of columns, we show that again 
exact matrix completion is possible, but in this case our algorithm requires many more - a constant fraction - 
of observations. One direct application comes from robust collaborative filtering. Here, some number of users are 
so-called manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any 
assumptions on the number, locations or values of the observed entries of the manipulated columns. In particular, 
this means that manipulators can act in a completely adversarial manner 

I. Introduction 

Recent work in low -rank matrix completion 13], 04] has demonstrated the following remarkable fact: Given 
a p X n matrix of rank r satisfying some technical assumptions (namely, incoherence - we discuss this in detail 
below), if its entries are sampled uniformly at random, then with high probability, the solution to a convex and in 
particular tractable optimization problem yields exact reconstruction of the matrix, when only 0{{n+p)r log^(n+p)) 
entries are sampled. 

Yet as our simulations demonstrate, if even a single column of this matrix is corrupted, the output of these 
algorithms can be arbitrarily skewed from the true matrix. Partial observation makes a priori identification of 
corrupted column vs good column, a challenging task. This problem is particularly relevant in so-called collaborative 
filtering, or recommender systems. Here, based on only partial observation of users' preferences, one tries to give 
accurate predictions for their unrevealed preferences. It is also well known and well-documented lITSll . 134 1 that such 
recommender systems are susceptible to manipulation. It is thus of interest to develop efficiently scalable algorithms 
that can successfully predict preferences of the honest users, while identifying the manipulators. 

This paper studies this precise problem. We do so by exploiting algebraic structure of the problem: the non- 
corrupted columns form a low-rank matrix, while the corrupted columns can be seen as a column-sparse matrix. 
Thus, the mathematical problem we address is to decompose a low-rank matrix from a column-sparse matrix, from 
only partial observations. Specifically, the problem this paper addresses is as follows. Suppose we are given a 
partially observed matrix M, and we know that the full matrix can be decomposed as 

M = Lo + Co. 

where Lq is low-rank and Co has only a few non-zero columns. Here both components may have arbitrary magnitude; 
the rank and column/row space of Lq as well as the number and positions of non-zero columns of Cq are unknown. 
Can we efficiently recover the matrix Lq on the non-corrupted columns, and also identify the non-zero columns of 
Co? And, how does the number of corrupted columns impact the number of observations needed? 

We provide an affirmative answer to the first question, and provide finite sample performance bounds that 
move towards answering the second. We give a convex optimization formulation, and sufficient conditions for when 
this optimization problem yields exact recovery of Lq, and identification of the corrupted columns. In particular, 
our results imply the following: if we observe only a vanishing fraction of entries, our convex optimization-based 
algorithm recovers Lg exactly even in the face of an increasing number of corrupted columns. If a constant fraction 
of the columns are corrupted, then our algorithm succeeds in identifying them and recovers Lq exactly, but now 
requires a constant fraction of observed entries. We require the locations of the observed entries in the non-corrupted 
columns (i.e. in Lq) to be chosen uniformly at random; significantly however, we do not assume anything about the 
number or locations of observations for the corrupted columns. 
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Motivating Applications 

A primary motivation for our investigation is the problem of Robust Collaborative Filtering. In online commerce 
and advertisement, companies collect user rankings for products and would like to predict user preferences based 
these incomplete rankings — this is the problem known as collaborative filtering (CF). Most popular in the news is 
the so-called Netflix problem liZTl . but such recommender systems are of increasing popularity and importance in 
online commerce. There is a large and growing literature on CF; see lUl, ||3j| and the references therein. Many CF 
algorithms have been developed (see e.g. lfT2l . ifTTl . ifTSl . ||25l . Il24l . ||23l . |[3l|). In many of the settings mentioned 
(again, most well-known in this category is the Netflix problem) this collaborative filtering problem is usually cast 
as a matrix completion problem, where one tries to recover a low-rank matrix Lq from its partially observed entries. 
However, the quality of prediction may be seriously hampered by (even a small number of) manipulators - potentially 
malicious users, who calibrate (possibly in a coordinated way) their rankings and the entries they choose to rank 
in an attempt to skew predictions |34|. In the matrix completion framework, manipulative users correspond to the 
setting where some of the columns of the matrix M are provided by an adversary. As the ratings of the authentic 
users correspond to a low-rank matrix, the corrupted ratings correspond to a column-sparse matrix. Therefore, in 
order to perform collaborative filtering with robustness to manipulation, we need to identify the non-zero columns 
of Co and at the same time recover Lq, given only a set of incomplete entries. This falls precisely into the scope of 
our problem. Our robust matrix completion results therefore lead to a provably correct robust CF algorithm. We note 
that in this paper we assume uniform sampling of the observed entries. This assumption can be relaxed, although 
we do not provide the details here. 

Another motivation is robust Principal Component Analysis (PCA) with partially observed data. In the robust 
PCA problem fi5\, f36l one is given a data matrix, of which most of the columns correspond to authentic data 
points and lie in a low-dimensional space - the space of principal components. The remaining columns are outliers. 
The goal is to negate the effect of outliers and recover the principal components. In many situations such as medical 
research (see e.g. |5|), the data matrix is only partially observed. Thus the problem of partially observed Robust 
PCA — recovering the principal components in the face of only partial observations, and also corrupted points — 
falls directly into our framework. 



Suppose there is a p x n data matrix M\ among the n columns, a fraction 1 — 7 of them span a r-dimensional 
subspace of W, and the remaining columns are arbitrarily corrupted. One is given only partial observation of 
the matrix M, and the goal is to infer the true subspace of the non-corrupted columns and the identities of the 
corrupted ones. Notice that neither the true subspace nor its dimension r is known, and no restriction is imposed on 
the corrupted columns except that the total number of them is controlled - they need not follow any probabilistic 
distributions, and they may be chosen by some adversary who aims to skew one's inference of the non-corrupted 
columns. 

Under the above setup, it is clear that the data matrix M can be decomposed as 



Here is the matrix corresponding to the non-corrupted columns; thus rank(Lo) — r and at most (1 — 7)71 of 
the columns of Lq are non-zero. Co is the matrix corresponding to the corrupted columns; thus at most of the 
columns of Cq are non-zero. Only some of the entries of M are observed. Let VI Q[p\ x [n] be the set of indices 
of the observed entries, and Vq, be the orthogonal projection onto the linear subspace of matrices supported on il, 
i.e.. 



With this notation, our goal is to exactly recover the column space of Lq and the locations of the non-zero columns 
of Co, given Vn{M)- 

A. Assumptions 

In general, it is not always possible to meet our objective of completing a low-rank matrix in the presence 
of corrupted columns. Indeed, under some circumstances, there are identifiability issues which makes the problem 
ill-posed. For example, if one row or column of Lq is completely unobserved, there is no hope of recovering that 
row or column. On the other hand, if Lq has only one non-zero column, it is also impossible to distinguish Lq from 
Co- Finally, if Lq has only one non-zero row, recovering Lq is infeasible unless that particular row is fully observed. 
To avoid such meaningless situations, we will impose that Lq satisfy the now standard incoherence condition |3l 
and observed entries of Lq are sampled uniformly at random. We note again that we make no assumptions on how 
the entries of Co are sampled, and moreover these entries could be adversarially chosen. 



II. Problem Setup 



M = Lq + Cq. 




Incoherence Conditions: Suppose the Singular Value Decomposition (SVD) of Lq is = C^o^o^o^- Let 
be the ith standard basis. We assume that the matrix satisfies the following two incoherence conditions, with 
parameter /^o- 

max II [/O'Neill ^ < /io-, 
j II II p 

uYa.yi\\vJeA\^ < ^iq ^ . 

Given a small incoherence parameter /io, the condition asserts that the left singular vectors of ^"e spread out. 
Without such a condition, matrix completion does not make sense, since it would be possible for the matrix Lq 
to also be row-sparse — one cannot hope to recover a row-sparse matrix with sparse observations, even without 
outliers. Consequently, this is a standard assumption made in the matrix completion literature O, lfT4l . and /Xq 
is likely to be small for many reasonable models |3|. 

The second condition asserts that the right singular vectors of are incoherent, and it essentially enforces the 
condition that the information about the column space of is spread out among the columns. This condition is 
important in the face of corrupted columns. If, for instance, a column of Lq were not in the span of all the other 
columns, one could not hope to recover it or distinguish it from one of the corrupted columns. This condition is 
standard in the robust PCA literature, and most practical problems have a very small parameter (e.g., |37|). 

For the corrupted columns, we make only one assumption: they are indeed corrupted. That is, we assume only 
the following. Suppose an oracle were to provide the true column space, Uq, of the low-rank matrix, L^. There 
would be no way to complete the observed entries of any of the columns of Co, so that it lies in the column 
space of io- If this does not hold, then there is no reasonable way to distinguish a corrupted column from an 
authentic column. Moreover, such entries will not affect the recovery of the unobserved entries in the authentic 
columns. In terms of the collaborative filtering application, this is akin to saying that we will only call a user a 
"manipulator" if the corresponding entries indeed would manipulate the entries of the authentic users. Other than 
this identifiability requirement, we make no assumptions whatsoever on the corrupted columns. The incoherence 
assumptions are imposed on the column and row spaces of Lq, not on M, as are the sampling assumptions, and 
thus the corrupted columns are not restricted in any way by these. One consequence of this is that we are not able 
to recover the complete corrupted columns, but we are able to recover their identities. 

Sampling Model: Let Io C [n] be the set of indices of the corrupted columns. Let C [p] x Zq be the 
set of indices of observed entries on the non-corrupted columns (i.e. the nonzero columns of Lo)- We assume that 
f2 is sampled uniformly random from all size-m subsets of [p] x Iq (this is sometimes called sampling without 
replacement); so m is the number of observed entries on the non-corrupted columns. Note that no assumption 
whatsoever is imposed on the observed entries on the corrupted columns; the adversary may choose to fill in all 
entries on columns in Iq or just a fraction of them, and the locations of these observed entries may be chosen 
randomly or depending on Lq. On the other hand, as we do not aim at (and there is no hope of) recovering the 
unobserved entries of Co, we can assume without loss of generality that all the unobserved entries of Co are zero, 
i.e., Vn{Co) = Co. 

B. Notation and Preliminaries 

We provide here a brief summary of the notation used in the paper We abuse notation by letting Cl (and Cl'^) 
be both a set of matrix entries, and also the linear space of matrices supported on these entries; similarly Iq and 
Iq denote both the set of column indices and the linear space of matrices supported on these columns. For a linear 
subspace S, Vs is the orthogonal projection onto S. The SVD of Lq is UqTiqVq . Let Vijo be the projection of 
each column of a matrix onto the column space of Lq, given by Vuoi-^) = UqU^ A; similarly for the row space 
■pVb(A) = AV'oV'o^. We write A g Vjjo for any A obeying Vuoi^) = i-^-, the column space of A is in the 
column space of Uq. Similarly A £ Vvg denotes Vvoi^) = ^- The subspace To is defined as the span of matrices 
with the same column or row space as Lq; thus we have 

To = {UqX^ + YVj, yx e M"^'', Y e RP"""^} 

and 

Vto (A) = Vu„ (A) + Vv„ (A) - Pu„Vvo (A). 
The complementary operators are defined as usual: 

Vu^^iA) = {I-UoUj)A, 
Vy^AA) = Ail-VoV^^), 

Vr^^iA) = Vy^^Vu^AA)^{I-UoUj)A{I~VoVj). 
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For a vector x, Xi is its «th entry. For a matrix A, Ai is its i\h column and Aij is its (i,j)-th entry. Five matrix 
norms are used: is the nuclear norm (the sum of singular values), ||A|| is the spectral/operator norm (the 

largest singular values), \\A\\^ is the matrix infinity norm (the largest absolute value of the entries), \\A\\^ ^ is the 
sum of I2 norms of the columns of ^, imioo 2 the largest norm of the columns of A, and finally \A\p is the 
Frobenius norm. 

flotation Related to 1^ on-corrupted Columns: Let n\ = n — |Xq| = (1 — 7)71 be the number of 
uncorrupted columns. Let TL: T^\-^ be the following linear mapping: given X G Iq, remove all its columns 

in 2o (which by definition are zero columns), and denote the resulting column truncated matrix as TZ{X). Note 
that this is an injection, and thus TZ^^ is well-defined. (We define TZ because we frequently need to operate on 
the Iq portion of a matrix that is all zero on Xq, and we can think of TZ as simply making the size of the matrix 
"compatible" with the operation appUed to it). Note that by assumption Vq^ € Xq, let Vq = TZ{Vq^)^ and 



and p = are thus number and fraction of 



Vy^ , Vf^ and Vfi^ are defined accordingly. By definition = O n Iq is the set of clean and observed entries, and 

= f2'= n Iq is the set of clean but unobserved entries, m = f2 
observed clean entries, respectively. 

The letters 77 and c and their derivatives {qi, C2 etc.) denote unspecified constants that are, however, universal 
in that they are independent of p, n, 7, m and r. 

A summary of the notation: 

Xq Set of indices of the corrupted columns 

f2 Set of observed entries 

Set of observed entries on the non-corrupted columns {=fl fl Ig) 

p Number of rows of Lq 

n Number of columns of Lq 

7 Fraction of corrupted colunms (= \Xq \ /n) 

ni Number of non-corrupted {— (1 — 7)72 = |2g|) 



m Number of observed entries on the non-corrupted columns (= O 
p Fraction of observed entries on the non-corrupted columns (= ^). 



innJol) 



Pq Coherence parameters for Uq and Vq (defined later). 

To The span of p x n matrices with the same row space or column space as Lq. 

To To restricted to the columns in Iq 

Vq Vq restricted to the columns in Xq 

{L, C) An optimal solution of the oracle problem (defined later). 

U, V The left and right singular vectors of L, respectively (defined later). 

T The span of matrices with the same row or column space as L (defined later). 

X The column support of C (defined later), 

fio A generic subset of entries of [p] x [m] 

III. Main Results and Consequences 

The main result of this paper says that despite the corrupted columns, despite the partial observation, we can 
nevertheless simultaneously recover Lq, the non-corrupted columns, and identify Xq, the position of the corrupted 
columns, as long as the the number of corrupted columns and unobserved entries are controlled. Moreover, this can 
be achieved efficiently by solving a tractable convex program. Our algorithm is as follows. 

Algorithm 1 Manipulator Pursuit 

Solve for optimum (L*, C*) : 



minimizei^c' 11-^11* + He'll!, 2 (1) 

subject to Vn (L + C) = Pq{M) 



Set I' = {j : C*. ^ for some i}, L' = 'Px-(L*), 
Output: L', T 
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We say our algorithm succeeds if we always have Vi^^{L') = L^, Vuq{L') ~ L' , and I' = Iq. We recall our 
single restriction on the corrupted columns: they are indeed corrupted, in that they cannot be completed so as to lie 
in the column space of the true matrix Lq — failing this, asking for Iq to be recovered does not make sense, nor 
is it even clear why such a column should be called "corrupted." 

A. Main Theorems 

Our first main theorem states that under some natural conditions, our algorithm exactly recovers the non-corrupted 
columns and the identities of the corrupted columns with high probability. Here and in what that follows, by with 
high probability, we mean with probability at least 1 — cn^^ for some constant c > 0. Recall that p is the fraction 
of observed entries on the non-corrupted columns and 7 is the fraction of corrupted columns. 
Theorem 1. Suppose ni > p > 32 and r < f, j < ^, p > £. If {f, 7, £) satisfies 

P 

and _ 2 

T^<V2- , (3) 

(1 + 1^) /igr-3 1og«(4ni) 

where rji and 772 are absolute constants, then with high probability Algorithm^^with A = \J ^¥JI on io°-^ {in Tj ^^^^^^^y 
succeeds. 

Remark. Notice the theorem does not require any assumption on the observed entries on the corrupted columns. 
In the case of collaborative filtering, a malicious user can choose to rate any subset of products in an arbitrary 
way. Also notice that to choose A, one does not need to know tlie exact values of p, 7, and r, but rattier bounds on 
them. 

We give three corollaries to illustrate the consequences of Theorem [T] 

Corollary 1. If r < Vi^j^' P ^ V2 '"^i/"^'* . 7 < '?3^' ^^^'^ Algorithm^^with A = \J^^-^ succeeds with high 
probability. 

Remark. Notice that the choice A is universal and does not depende on any unknown quantity. In the case of 
p = Q{ni), we can recover the non-corrupted columns with a vanishing fraction of entries observed and a growing 
number of corrupted columns. 

Corollary 2. If p > 0.1 and r < f < rji ^ jQg3^^4„ y then Algorithm^with A = t^n^^°s_^^^^^) succeeds with high 
probability if 

/igr3 log''(4ni) 

Remark. With a constant fraction of entries observed, the fraction of corrupted columns can be as large as one 
over a polylog factor If p = 1, we partially recover the result in 
Corollary 3. If "/ = 0, r < f, and m satisfies 



m > rjiPf^r nlog (4n) 

then w.h.p. Algorithm^with X — n has a unique solution {Lq, 0). 

Remark. This (partially) recovers the matrix completion result in jj^, (i9\l. 

Benign Corruptions: Recall that thus far, the corrupted columns are not subject to any restrictions. In 
particular, the incoherence conditions are not imposed on Co, and the number and locations of the observed entries 
on the corrupted columns can be arbitrary. If the corruptions Cq are not entirely adversarial, however, and in fact 
satisfy some additional assumptions, then we can do better: the condition on 7 is weaker, and the polylog factor 
can be eliminated. This is stated in the following theorem. 

Theorem 2. Suppose n-i > p > 32 and 7 < 7- In addition, assume that the entries on the corrupted columns are 
fully observed, and the left singular vectors of the full matrix M (and not only those of Lq) denoted by Um, satisfy 
the following incoherence condition: 

max WVuM^^iWl. < Mo-- (4) 

l<i<p p 

If (r, 7, p) satisfies 

^ lUP^r"^ log^(4ni) 
%/plog(p) 
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and 

— o 

7 P 

where rji and 772 are absolute constants, then wth high probability Algorithm jTj with A = 4^7^ strictly succeeds. 
Remark. If p scales linearly with ni, then we have no polylog gap. In particular, we can recover the non-corrupted 
columns Lq exactly, in the presence of a constant fraction of corrupted columns, given a constant fraction of 
observations. 



B. Connections to Prior Work and Innovation 

In the matrix completion problem, one seeks to recover a low rank matrix from a small number of its entries. It 
has recently been shown that by using convex optimization f3l, fT], f9l, f28l or singular value thresholding |FT4l, one 
can exactly recover an n x n rank-r matrix with high probability from as few as 0(rtrpoly log n) entries. Our paper 
extends this line of work and shows that even if the observed entries on some columns are completely corrupted (by 
possibly adversarial noise), one can still recover the non-corrupted columns as well as the identity of the corrupted 
columns. 

The centerpiece of our algorithm is a convex optimization problem, that is a convex proxy to a very natural 
(but intractable) algorithm for such recovery, namely, finding a low-rank matrix L and a column-sparse matrix C 
consistent with the observed data. Such convex surrogates for rank and support functions have been used extensively 
in vector problems and low -rank matrix problems (e.g., IS), ll29l . and more closely related to our topic of interest, 
matrix completion papers, e.g., ||4l, S, Cgl) and matrix decomposition papers |l2^, |'6'|. Our analysis also adapts 
important ideas from the previous literature, especially the ideas of dual certification and Golfing Scheme in |3|, 

m- 

Besides the obvious difference in the problem setup, our paper also departs from the previous work in terms of 
mathematical analysis. In particular, in all the above works, the intended outcome is known a priori - their goal is 
to output a matrix or a pair of matrices, exactly equal to the original one(s). In our setting, however, the optimal 
solution of the convex problem is in general neither the original low rank matrix Lq nor the matrix Cq which 
consists of only the corrupted columns. This critical difference requires a novel analysis that builds on the method 
of the "Oracle Problem" introduced in |36]. 

Our work is also related to the problem of separating a low-rank matrix and an overall sparse matrix from their 
(possibly partially observed) sum, with sufficient condition for successful recovery provided 12], ||6|. Compared to 
this line of work, our results indicate that separation is still possible even if the low-rank matrix is added with a 
column-sparse matrix instead of an overall sparse matrix. Moreover, although we don't pursue in this paper, our 
techniques allow us to establish results on separating three components - a low rank matrix, an overall sparse matrix, 
and a column-sparse matrix. 

The presence of (randomly) missing entries and corrupted columns — and thus dealing with three matrix 
structures simultaneously — requires the introduction of new ingredients. In particular, one important technical 
innovation requires the development of new bounds on the 2 norms of certain random matrices. 



IV. Proofs of Main Theorems 

In this section we prove our main theorems. The first five subsections are devoted to the proof of Theorem [T| 
while the last one proves Theorem |2] 

The proof is quite technical, and requires a number of intermediate results. To clarify the exposition, and also 
to provide a high-level roadmap of what we do and why we do it, we first outline the main steps of the proof in 
Section IV-A The proof itself is contained in Sections IV-B to IV-E Then in Section IV-F we show that under 



additional assumption of the outliers, the proof of the dual certificate can be simplified, and stronger recovery result 
can be obtained, namely Theorem |2] 



A. Skeleton of the Proof 

In this section we provide a proof-skeleton of our main theorem. The full proof details are given in the subsequent 
sections. The main roadmap to proving a convex optimization problem recovers a desired solution, is to demonstrate 
that with high probability, one can find a dual certificate of optimality of the desired solution. This basic recipe 
underlies many of the proofs in sparse recovery and low-rank recovery f2l, f3l, f6l. A central roadblock to this 
approach is that unless the adversary's corrupted columns happen to be perfectly perpendicular to the column space 
of the true low-rank matrix, the convex optimization problem given will not precisely recover Lq. The reason is 
simple: if the corrupted columns have a non-perpendicular component, then some part of that will be put into the 



7 



L matrix the optimization recovers. Algorithmically, this matter is irrelevant: as long as the corrupted columns are 
identified, and the recovered L matches the desired Lq on the non-corrupted columns, our objective is met, and the 
problem is solved. The analysis, however, is significantly complicated, since because we do not recover Lq exactly, 
we no longer explicitly know for what to write a certificate of optimality. 

Beyond this, significant challenges arise because of the simultaneous presence of three matrix structures: low- 
rank, matrix-sparse, and column-sparse. This requires a number of additional innovations. The six main steps of the 
proof are as follows. 

Step 1. The first step is quite standard in the matrix completion literature. It says that with high probability, 
under the sampling regime of the stated results, the sampling operator Vno on the non-corrupted columns, is 
invertible on the span of matrices with either the same column or row space as Lo- Without such a result, matrix 
completion under any algorithm would be hopeless. We note that in our case, we cannot make any statements 
about the operator Vrt which involves sampling on the corrupted columns, since we make no assumptions on the 
distribution of the samples on the corrupted columns. The result we prove below, essentially says the following: 
when m > ^■^Qr{nl +p)/31og(ni +p), (as we require in our main theorems), then with high probability. 



To 



< 



(7) 



We refer to this condition repeatedly in what follows. 

Step 2. For the algorithm to succeed, it is sufficient for the recovered pair {L* , C*) to have the right column 
space and correct non-corrupted columns for L*, and the right column support for C* . To identify such a solution, 
we consider the following Oracle Problem; here F denotes the space of matrices supported on the set of all entries 
in the non-corrupted columns plus the observed entries in the corrupted columns. 

minimizeL^C" ll-^IL + He'll! 2 
subject to VriL + C) = Vr{Mo) 

The Oracle Problem is feasible, since the true pair (Lo, Co) is feasible. Let (i, C) denote the solution to the Oracle 
Problem. We must identify conditions that a dual certificate must satisfy to guarantee that (L, C) is an optimal 
solution to Algorithm [T[ and that any optimal solution to Algorithm [T] must also have the correct column space and 
column support. 

Step 3. To state these conditions, we need some definitions. 



utv^ 

f 
I 



:— the singular value decomposition of L 

= column support of C 

= {i/eMf><"|7'i5(iJ) -0; 



Vi e I, H, = 



\\H, 



< 1 



It is now straightforward to demonstrate that Q is a dual certificate as long as it satisfies the following: 

(a) Qen 

(6) VfiQ) - UV^ = 

(c) ||7'^,(g)|| < 1 

{d) 7'xo(g)eA©((7) 

(e) Vi^^{Q) <A. 

00,2 



We construct a certificate Q e il, by first constructing a certificate, Q, that satisfies {b) through (e), and then 
sampling it according to 51 and scaling appropriately. We then use concentration inequalities to show that the 
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sampling procedure is "close enough" to the identity map. Following this program requires some care. In particular, 
the equality constraint in (6) must be relaxed, since the concentration inequalities can only guarantee that it is 
approximately satisfied with high probability. This is done in the next step. 

Step 4. Consider any feasible perturbation, (L + Ai,C + A2). Given a Q that satisfies properties (a) — (e) 
above, it is immediate to show that (Z + Ai, C* + A2) is suboptimal: 



L 


+ A 


C 


< 


L + Ai 


+ A 


C + A2 










1,2 








1,2 



Condition (6) above, Vf{Q) - UV'^ = 0, comes from the need to show that the above inequality holds for all 
values of the perturbation, Ai, and in particular, its projection onto Vf, the column and row space of L. However, 
Ai cannot be arbitrary. 

Lemma 1. Suppose Ai, A2 G M^^" are feasible perturbations, i.e., they satisfy Pn(Ai) + P^{/S.2) — 0. Then 
under the sampling regime in the above results, the condition Q holds with high probability, and we have 



2pni 



rPr'^Ao 



Then, since Ai cannot be arbitrary, the equality of condition (b) can be relaxed. This leads to alternative 
conditions that Q must satisfy. 

Proposition 1 (Alternative Dual Certificate Condition). Suppose A < 1. Then with high probability, under the 
sampling regime of the results, the condition |^ holds, and (L, C) is an optimal solution to ([T]) there exists Q 
such that 



(a) 
ib') 



(d) 
ie') 



Vf{Q)-UV^ ^Vf-R^^iD), 
for some D with \\D\\p < 



2pni 



< 



2' 



A 

00.2 ~ 2 



If both inequalities are strict, and VigOVy = {0}, then any optimal solution {L' , C) to ([T]) satisfies Vi^ {L') = Lq, 
Vua{L') — L' , and 7^iono(C") — C , which means Algorithm^succeeds. 

Step 5. The next step requires constructing a dual certificate Q, that satisfies properties (6) — (e), and also 
{b') — {e'). Ignoring the requirement of (a), essentially allows us to consider the fully observed problem of separating 
a low-rank matrix from a column-sparse matrix — a substantially more manageable problem. The Q that we obtain 
satisfies all constraints except for (a), and thus is the Q that we then sample. The sampling procedure is described 
next. 

Step 6. The final step requires us to sample Q to obtain Q, and then show using concentration inequalities, that 
the resulting Q satisfies (a') — (e') with high probability. The naive approach does not quite work, and thus requires 
a different sampling scheme. We do this using a modification of the approach coined "The Golfing Scheme" |9|, 
flOJ . We sample by a modified batched sampling -with replacement scheme. The final step requires showing that 
Bernstein's inequality still holds under this scheme (since the sampled entries are no longer all independent). 

The Oracle Problem approach, the conditions on Ai and A2 in the Lemma above, the alternative conditions for 
the certificate that we present here, and the validation of our choice of the certificate, are new. Moreover, because 
our objective involves a || • ||i_2-term, our results require us to obtain new concentration results for the dual || • ||oo,2 
bound, that are previously not known (at least to us). 



B. Proof of Alternative Dual Certificate Conditions 

In this Section, we prove the alternative dual certificate conditions given in Proposition [T| The main idea is 
simple: The equality constraint of the condition (6), namely, Vf{Q) — UV^ = 0, comes from considering a 
perturbation (Ai,A2), where Ai has arbitrary projection onto the space Vf. However, we need only consider 
feasible perturbations, i.e., pairs (Ai, A2) that satisfy Psi(Ai) +7'si(A2) = 0. We show that any such pair need 
obey an additional constraint on 7'.y-(Ai), as given above in Lemma [T| This then allows us to replace the equality 
constraint of [h) by the inequality in [b'). Now for the details. 
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The first result is quite standard in the matrix completion literature, and at some level indicates why matrix 
completion from a small collection of entries is even possible. It says, essentially, that in the space To of matrices with 
the same column or row space as the low -rank matrix Lq, the sampling operator causes no loss of information, 
i.e., it is invertible. More specifically, the result bounds the operator norm of ^^'PfJ-'n^f^ ^^fa' "^^^ proof follows 
that of [28, Theorem 3.4]. The only difference is that fSS", Theorem 3.4] assumes sampling with replacement, while 
we assume sampling without replacement, which does not cause a problem as recently shown in 1,11]. 
Lemma 2. Suppose fio G [p\ x [ni] is a set of tuq entries sampled uniformly at random without replacement. Then 
for all /3 > 1, 



pni 



with probability at least 1 



mo 

2 max{ni, p}^~ 



To 



< 



'l6/3/ior(ni + p) log(ni + p) 
3mo 



(8) 



2/3 



provided that m,Q > i^/ior(ni +p)/31og(ni + p) 



Remark. In particular, when m > ^iJ,Qr{ni +p)p\og{ni + p), which is satisfied under the assumption of our 
main theorems, we have w.h.p. that the condition Q given above, holds: 



pni 



'Pfo'Pn'Pfo 



la 



< 



2' 



and thus 'Pf^'PciP-fg invertible on To- We will make use of this result throughout the paper 

The next three lemmas prove some important properties of (L, C), as well as the column and row spaces of 
L and L^. Indeed, one of the challenges of developing a certificate for the solution to the Oracle Problem, is that 
we must relate properties of L, U and V, to properties of Lq, and in particular Uq and Vq. We use these lemmas 
repeatedly in the sequel. Lemma [3] is an analog of Ii37i Lemmas 4 and 5]. 

Lemma 3. Let (Vx[ 
V 



and invertible N G 



prxr 



^ I T'l'! {V )\. We have Vi" (V) — Lq, V(j — Vua' ^ ^ and there exists orthnormal 



such that 










(9) 






(10) 


Vr 


= Vuo+Vv-VuoVv 


(11) 


Vis 


= VoN 


(12) 



Proof: By definition of the oracle problem, we have Vt{L + C) = VriLo + Co). Applying T^xj to both sides 
of the equality and noticing that Lq £ Iq, Co, C € Iq, we obtain 'Pi^{L) = Lq. Then everything except the last 
equality can be proved in excatly the same way as in ll37l Lemma 4, 5]. 

Now for the last equality in the lemma. Since Vi^{L) = Lq, the columns of Vq and Vxc span the same space. 



Thus there exists an invertible N G 



such thatVic = VqN. 



The next lemma is an analog of ||37l Lemma 6]. 
Lemma 4. There exists some H such that H € fin ©(C) and 

UVi,XV) = UqVi,{V^) 



(13) 



Proof: The proof is almost identical to that of 1371 Lemma 6]. Since {L, C) is an optimal solution to the 
Oracle Problem, by convex analysis there exists Qi, Q2, A', and B' such that 



where Qi, Q2 are subgradients to 



and to A 



C 



This means that Qi 



1.2 



some Zi e 'Pfi_ and Q2 = A(i7 + Z2) for some H e 0(C) and Z2 G IS- Let A 
have 

UV^ + ^t/- (A) = UV^ + Vu^^ {A) = XH + Vi^ (B) e T 



h Zi = UqV^ + Zi for 
A', B = XZ2 + B', we 



Notice that H E Iq and XH + 'Pi^{B) E T imply H E fl. Applying Vuo'Pio to the above equality gives the equality 
([13). ° ■ 

Finally, we have the following simple technical lemma, which manipulates the operators Vf-^, Vf, Vf^, Vf^, 
Vi^, TZ and TZ^^. This lemma in particular, is used repeatedly below. 
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Lemma 5. For any X e M^^" and Z e Iq, we have 

^r„7^ (^iS^r(^)) = T^i'Pxs'Pri^)) (14) 

VfU-^VfUiZ)) - Vf{Z) (15) 

Vf^-R-^^ [Pf^^n{z)^ = n-^ (Vf^_Ln{z)^ . (i6) 

Proof: For the first equality, we have 



= n {Vx^Vu, {X)) + VfU {Vu^^ {X)VVio {V^) 

= n {Vi^Vuo {X)) + VfVu„^ {x)v (vx^y 

= n {Vx^Vu, {X)) + Vu^^ {X)VN^V^ 
= n {Vx^^Vuo {X)) + Vu^. {x)v {vi^Y 

= n{Vx^^Vf{x)), 

where we use Lemma [3] 

For Z E Iq, denote Z = TZ{Z). The second equality is given by 

VfU-' (Vf^iz)) = Vfn-'(vuAz) + Vu^^Vy^{z)) 

= VfVuo {z) + Vu^.n-' {Vu^^ {z)%v^) vv^ 

= Vuo {Z) + Pt/- {Z)V^V^Vi.V^ 

= Vu. [Z] + Vu^^ {Z)V^V^%NV^ 

= Vu^Z)+Vu^^{Z)V^NV^ 

= Vu^Z)+Vu^^{Z)Vx^V^ 

= Vuo{Z)+Vu^.{Z)VV^ 

= Vf{Z). 

The third equality is given by 

Vf^n-UPfAz)) = ii-Vf)n-U{i-Vfjiz) 



= z-n-' [Vf^iz)j - Vfiz) + VfU-' [Vf^^iz) 
= z-n-'(Vf^{z))-Vf{z) + Vf{z) 
= z-n-UvrSz) 



where we make use of the second equality. ■ 
The next step is important — we now prove Lemma [T| stated above, showing that if Ai and A2 are feasible 
perturbations, then VflAi) must satisfy an additional constraint. Using this, we are then able to relax the equality 
constraint of the certificate, to an inequality. As pointed out earlier, the idea of obtaining conditions for a dual 
certificate with relaxed equality constraint, have appeared earlier, first in |9| and then also in |j2l, ll28l . The following 
constraints, however, are new, as are the relaxed dual certificate constraints. We restate Lemma [T] here. 
Lemma [1} Suppose Ai, A2 G W^" are feasible perturbations, i.e., they satisfy Pn(Ai) + Po(A2) = 0. When 
Eq.(|7]i holds, we have 



VxsVfi^,)\\^ < ^/^(||7't-(A0|L + ||7'xs(A2)||^J. (17) 
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> 
> 

> 



Proof: We have following chain of inequalities 

||^i5(A2)||,^,>r^(A2)||^ = lln2(Ai)||^^ 

|^n^r(^i)|lF- ll^o^r-(^i)|lF 

\v^Vf{^l)\\p~\\Vf^{^l)\l. 

On the other hand, TZ (Vi^'Pf{Ai)) G To by the first equality in Lemma [s] It follows that 

= {T% {-R-Vx^VfiA,)) , Vf^V^^Vf,, {nVx^^VfiA,))) 

7T7 Til 



> 



where the last inequality uses Eq.(|7]i. Collecting these facts, we obtain 



l^r-(^i)|L + ll^i5(^2)||i.2> 



2pni 



Vx^,Vf{A,)\ 



We now turn to the proof of Proposition [T] 

Proof: (of Proposition[l]l The first part of the proof (the proof of non-strict success) is standard. To prove [L, C) 
is optimal to ([T]), we need to show that any other feasible solution (i + Ai, C + A2) with Po(Ai) + Po(A2) = 
can not have a objective value lower than that of {L, C). Take Wi G P-i- such that ||Wi|| — 1, (Vt^i, Vf±Ai') = 
||P.^i Ai||^ and W2 e Xq such that = 1' (^2, 'PxgA2) = \\ricA2\\^^. Then UV^ + Wi is a subgradient 



of 



L 



and Vxq {Q) + XW2 is a subgradient of A 



C 



Notice that 



(VfTZ-^D), A,) = (7^-lp), PigP^Ai) 



Therefore, we have 



i + Ai 


+ A 


C + A2 




L 


-X 


C 








1,2 









(1) /..-r 



Pio(Q) + ATy2, A2 



(2) 


\v 




(Ai)| 


* + 




\v 




(Ai)| 


* + 


(3) 
> 


\v 




(Ai)| 










00,2 




(' 




Pt- 


(Q) 


(4) 
> 








(Q) 



^r^(Q) ll^r^(Ai) 



|Pic(A2)||^2-|l^llF||7'is^t(^i)|lF 
l^xs(^2)||,. 



2pni 



\\D\\p{\\VfAA^ 



1^x5 (A2) 11^, 



= 1- 



Pll^ ||Pt^(Ai)||^+ A- Pis(Q) 



2pni 



PIIf Pxs(A2)||i^ 



(5) 

> 0, 
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where inequality (1) follows by the definition of subgradient, equality (2) follows due to the fact that Q E ft, 
inequality (3) uses the property of dual norms, inequality (4) follows from Lemma [T] and finally inequality (5) 
uses the assumption of the proposition at hand. Therefore, (L, C) is an optimal solution. 

Now suppose both (6') and (d) are strict. If the last inequality is strict, then (L, C) is the unique optimal 
solution. Otherwise, we must have 

||P^,(Ai)||^ = ||Pis(A2)||^2=0- 

Since Vni^i) - -Vn{^2) and A2 G Iq, we have n2^is(Ai) = -'Po'PxslAa) = 0. Because Ai e f, we have 
7^(^xg(Ai)) G To by the first equality in Lemma [s] It follows that 

ni^ro^(^is(^i)) = nl7^(7'I5(Al)) = 0. 

Applying {'Pfg'Pn^%) ^ T^fo sides of the above equality gives 7'ig(Ai) = 0, which means Ai G Jq. 

Furthermore, we have 

^t/-(Ai) 
= Ai-7'^,(Ai) 

= 7't(Ai)-^c/o(Ai) 

and thus Vjjj-iAi) G Vy. This implies 7'y^_L(Ai) — because 7-'yi(Ai) G Iq and Vy DIq = {0}. Therefore 
Ai G Vuq- On the other hand, if (L + Ai, (7 + A2) is an optimal solution, we must have A2 G fl; otherwise 
(L + Ai, C + 'Pn{^2)) will have strictly lower objective value. Putting all together, we conclude that for (L + 
Ai, C + A2) to be optimal, we must have Ai G loHTuo^ ^2 G loHfi, and Vi^^iL + I^i) = Vi^\L) = Lq, where 
the last equality is shown in Lemma [3] This completes the proof. ■ 



C. Technical Lemmas 



In this sub-section we collect several technical lemmas, which are required for constructing the dual certificate. 
These lemmas bound the norms of certain random operators/matrices. 

Our basic tool for bounding matrix norms is the Noncommutative Bernstein inequality. The version presented 
below is from |28|, except that here we assume the sampling without replacement model; this is possible because 
it has been shown that the Noncommutative Bernstein Inequality still holds under this model 



Lemma 6 (Noncommutative Bernstein Inequality). Theorem 3.4] Let Xi, . . . Xl be zero-mean random matrices 
of dimension dixd2 sampled uniformly without replacement from some finite set. Suppose a'j, = max | ||EXfcXj|^ |' 
and \\Xk\\ < M a.s. for all k. Then for any r > 0, 



fc=i 



> r 



< (di + (^2) exp 



72 



A.fr/3 



as long asT<^ Y.k=i 'di- 
does not increase the matrix 



Remark. Observe that the right hand side is less than (c?i +^2) exp (^~^t'^ / X]fc=i '^'Cj 

The next lemma states that for a fixed matrix in To, the operator (^^^"Pf^VQ — 
infinity norm. Its proof largely follows that of 1281 Lemma 3.6] and uses the Noncommutative Bernstein Inequality, 
with the modification that sampling without replacement is assumed. 

Lemma 7. [28, Lemma 3.6] Suppose Qq G [p] x [ni] is a set of entries sampled uniformly at random without 
replacement. Let Z E Tq be a fixed p x rii matrix. Then for all /3 > 2, 



pni 

mo 



VfVn„iZ)-Z 



< 



1 8(3por{ni + p) log(ni + p) 



3mo 



with probability at least 1 — 2max{ni, p} provided that ttiq > |/3/ior(ni +p)log(ni + p). 

The next lemma bounds the operator norm of (^^^^[2(^) ~ ^) ^^^^ infinity norm of Z. Ag 
adapt the proof in li28J to the sampling without replacement model. 



am one can 



'intuitively, this is because sampling without replacement implies negative association in some sense, although this argument does not 
really work as a rigorous proof due to the lack of total order for matrices; the proof in lil U uses a coupling argument. 
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Lemma 8. 128 Theorem 3.5] Suppose Qq G [p] x [rii] is a set of tuq entries sampled uniformly at random without 
replacement and let Z be a fixed p x ni matrix. Then for all j3 > 1, 



pni 
mo 



< 



' 8/3pni max{p, m} log(ni + p) 



3mo 



\Z\\ 



with probability at least 1 — {ni provided that mo > 6/3min{ni,p}log(ni +p). 

The bounds in the next two lemmas are new. The first one states that for a fixed matrix in To, the operator 
(^^^To^fi^To ^^To) ^'^^^ ^'^^ increase its infinity-two norm too much. The proof uses the Noncommutative 
Bernstein Inequality and is given in Appendix [A] 

Lemma 9. Suppose p < rii and fip G [p] x [ni] is a set of mo entries sampled uniformly at random without 
replacement. For any Z G To and (3 > 1, we have 



pni 
mo 



VfVn.VfSZ)-VfSZ) 



To 



2-2,3 



< 



16 



/3^/S^pibg^||Z|l 



with probability at least 1 — (2ni) 

The next lemma states that the operator ^^V^^ '^'^^ ^'^^ increase the infinity-two norm of a matrix whose column 
space is the same as Uo- The proof is given in Appendix [B] 

Lemma 10. Suppose flo G [p] x [rii] is a set of mo entries sampled uniformly at random without replacement. For 
any matrix Z G IR"^'', we have 



pni 
mo 



< 1 



16/3/ior(ni + p)plog(ni + p) 



3mo 



\Uonvx^ {z^)\ 



oo,2 



with probability at least 1 — {rii + p)^ ^'^ provided mo > ^P^oi^i^i + p) log(ni + p). 



D. Constructing the Dual Certificate via the Golfing Scheme 

In this section, we construct the dual certificate, which builds on the dual certificate used in ll37l and utilizes 
the Golfing Scheme. 

Recall that Lemma [4] guarantees the existence of an H satisfying Eq.([T3]l. Following ||37]| . let 



Ai 
A2 



and let ^ = 



1) 

2) 
3) 
4) 



= XPuo{H) = UoVi,{V^) 
= UV^ + XH - Ai - A2, 

By 1371 Proof of Theorem 4], when < 1, the following holds. 



\\VrAQ)\\ = 



< A 



771, - 



1 — tp 



772; 



|^is(g)| 



00,2 



\UoVx^,{V^) 



A2 



< 



fJ-gr 

loo, 2 — y ni ' 1 — j/i 

This dual certificate Q, satisfies all the conditions in Proposition [T] except the requirement of being in il. 
Moreover, this requirement can only potentially fail on the columns in Iq. Thus, there is a natural candidate solution: 
build Q identical to Q on the columns in Zq, and for the columns Iq, sample the certificate Q according to ft. 



A 



That is, Q S Vi„iQ) + Z, where Z = TZ-^ (f^^s=2^ro^^2:s (Q)) ■ Evidently, Q & fl, while E[Z] = Vx^^iQ), and 

hence E[Q] = Q and thus satisfies all the required properties. We may then use matrix concentration inequalities to 
show that with high probability, Q itself satisfies the required conditions. Note that not requiring the random part 
of Q, namely, Vi^^ {Q), to satisfy any equality constraints, is critical (whence the need for the alternative sufficient 
conditions of Proposition [T| 

The details are slightly more complicated, and require us to use the Golfing Scheme to construct the desired 
dual certificate. For technical reasons, we need to modify our sampling model as follows. |^ 



^This is because we need a certain amount of independence in Q,, due to the fact that the bounds in Lemma jTJ-^TOf are not uniform in 
Z. See |9|, |11| for discussion of this issue. 
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Sampling with batch replacement model: We assume that O consists of s batch of entries sampled from the 

columns, with each batch of size q, where the sampling operation proceeds as follows. We draw the first batch 
Vli of q entries uniformly random from [p] x [rii] without replacement. Then we replace all the entries in the first 
batch and draw the second batch CI2 of q entries independently of the first batch. We repeat this procedure for s 
times. In this way, we obtain a total of m — q y. s (perhaps non-distinct) entries Vl = {Jl^i ^i- Notice that every 
single batch contains distinct elements, while the batches are independent of each other. 

In Appendix [C} we argue the following: if there exists a dual certificate with high probability under the sampling 
with batch replacement model, then the probability that there exists a dual certificate under the sampling without 
replacement model with the same m can only be higher. Therefore, we only need to construct the dual certificate 
under the sampling with batch replacement model. 

Since by assumption m — ppni satisfies 

m > mt^lr^ni log^(4ni), (18) 

we may choose s = [2 • |log(4ni)] and q = m/s > ci/ipr^yii log(47ii) for some constant ci sufficiently lareg. 
Define 



Yo = 
Y,, - 





Y,-i 



pni 



Q = ^I„(Q) + 7^-l(n). 

Notice that under the above construction, 'Pxg{Q) — VxfXQ)- Thus we are using the Golfing Scheme only for the 
part of Q. We show now that Q is the desired dual certificate, i.e., satisfying the conditions of success in Proposition 

To simplify the subsequent presentation, we introduce one more piece of notation and define 

K = /iorlog^(4ni). 

We collect below several inequalities which will be used in the proof of the dual certificate. (These inequalities are 
just condensed form of Lemma |2] and Lemma [7p0|) Let c denote some constant sufficiently large, and suppose p 
obeys the lower bound in Theorem [T] with a sufficiently large rji. By Lemma [2] the following holds w.h.p. for each 



pni 



la 



< c 




By Lemma [t] the following holds w.h.p. for each i and any Z <eT. 

pni 



< c 




By Lemma |8] the following holds w.h.p. for each i and any Z, 



\ 9 



<6, 



IniK 
t^orp 



< 



1 j pni 



By Lemma [9] the following holds w.h.p. for each i and any Z <eT, 



pni 



'PfPh^Tfo^Z)-Z 



K 

< c 

00,2 pVp 



By Lemma 10 the following holds w.h.p. for each i. 




-)||^o7^7'xs(n 



Also notice that ||C/o7^■PIg (^^^)||^ 2 = \\UonVi^^ {V^) e^\\^ = max, \\Vi^^ {V^) 

We are ready to show that Q satisfies the condition of success. 
Proposition 2. If ni> 32, m satisfies Eq. \\2i\ , and A satisfies 

ci (1 



(19) 



(20) 



(21) 



(22) 



(23) 



/K 



/K 



Mo'' 



K \ log(27ti) ~ 
P\/P ) P 



(1-7) 



< A < 



771 if ' 



(24) 
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for some Ci and C2 sufficiently small, then the certificate Q constructed above satisfies 

(a) Qen, 

and satisfies the remaining conditions of Proposition [7] with high probability: 

(b) Vf{Q) - UV^ = VfR-\D)for some \\D\\p < i^pA; 

(c) ||^r-(Q)|| < h 

(d) Vx,{Q) & \(&iC); 



(e) 



00,2 



<!• 



Proof: We know that e fi by Lemma [4} and Yg e il hy construction. Therefore Q = XH + TZ^^(Ys) G ft. 
Denote Xq = TZ (Vx^Q) . We first derive an equaUty which we make use of in the rest of the proof. For i = 1, 2, . . . , s, 
we have 



pni 



(25) 



n 



pni 



(26) 



where the last equahty is by recursion. 

Also notice that by our assumption on A, we have Xy/yii < -^\f^ < g^- It follows from fjT, Lemma 7] that 

1- V> 1-AV^> ^. 

Step 1: Let D = Vf^{Ys) - Vf^iQx-)- By the second equality of Lemma Is] we have VfR^^iY^) = 
VfTZ-^iD) +VfVx§iQ). Therefore, 

Vf{Q)~uv^ = rf(Vxo{Q) + n-\Y,))^uv^ 

= {VfPx[Q) + Vfn-\D) + VfVx^^iQ)) ~ uv^ 

= Vfn-\D), 

where the last equality uses the properties of Q. 

The rest of this step is standard when one uses the Golfing scheme. Recall that we want Q to be close to UV^ 
on f, and notice that Vf{Q) - UV^ = Vf^iY^) - Vf^^iQx-)- So Vf^^{Yi) - Vf^{Qx-^ can viewed as the "error" 
after the i iterations of the Golfing Scheme. The proof consists of showing that the error decreases geometrically 
at each iteration. Indeed, we have 



(1) 
< 



< 



< 



(2) 

< 



< 



(3) 
< 



Vf'R-\D)\\p 
pni 



n 

1 
2 
1 
2 

1 
2 

1 
2 
1 

2 




-^0 lloo,2 










K 




/ "1 1 


\! ni 




1 - 


j 


1 ni 


2A, 







(27) 
(28) 
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Here, inequality (1) uses Eq.([T9]l, inequality (2) is due to 1371 Proof of Theorem 4, Step 5], and inequality (3) is 
proved in Step 4 below. Since s = 2 • | log(4ni), and y/p > the last line is bounded as 



ni A < 



2-|log(4ni) 



(29) 



< 2-21°S2(4"l)^^ 



< 



(4ni)2 



Step 2: Modulo some technical issues, and some previously established facts from ll37l . this step is also 



standard when one uses the Golfing Scheme; the key is showing 



is small. Notice that 



where the last equality uses the third equality in Lemma |5] Therefore, we have: 

Vf^{Q)\ = \\Vf^Vi,{Q)+Vf^n-\Ys)\\ 



< 



XH 



We bound each of the four terms on the right hand side, separately. 



By ll37l Lemma 2], the first term is bounded as 



XH 



80- 



The second term can be bounded using results from Step 1 



\rf„iYs)-Vt{Qx.)\ 



< 



< 



< 



(a) 

< 



< 



(b) 
< 




1 
1 

2^27^ 



MoT- _^ X^jn 

Til 1 — 1p \j Til 

1 - -0 

+ 2X^/rn 



80' 



where (a) is due to /io < ^ and (b) is due to ni > 32. 

For the third term, let V^a = TZVxi^{V^). We have the following chain of inequalities. The key steps are 
inequality (1) and (2) below, where we bound || || with using Lemma [s] and then bound || || using Lemma |7| 
these are where we need the fti's to be independent of each other. 
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< 



(a) 

< 



< 



< 



1=1 ^ 



E 

s 

E6 



pni. 



IniK 



|Pto(Qxs)-^'ro(^-i) 



E 



(6) 

< 64 



< 6a 



< 6a 



'niK 



" 1 
E^ ll^ro(<32:s' 



IniK 
fJ-nrp 



1 / pni 



l^ro(Qx5)| 



l^fl,^(A2)|L) 



IniK 
Porp 



('^) 1 / pni / /igT^ 



1 A/TTi K 

1 , [k 



porp l-ip 



(30) 



id) 1 
< - 



12 ■ 



1 / P 



•ynK 



< 



Here, inequality (a) follows from Eq.(pT|), inequality (6) from Eq 26 and ( |20] l, and inequality (c) from the incoherence 
assumption and the following inequalities (similar to (jT Proof of Theorem 7, Step 5]): 

ll^ro^(A2)L,, 



max 

l<z<ni 



max 

l<i<ni 



(/-C/oC/o^)(AH)l/t/^ 



1 + 



< max / — UqUq 

Ki<ni 



XH 



\vv' 



i=l 



< 



1 -%l> 

Finally (c?) is due to our choice A. 



(31) 
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The fourth term can be bounded in a way similar to [37] Theorem 7, Step 4] as follows. 



< 



(a) 
< 



< 
< 
< 



Vf^ {Uorx^iv^))\\ + rx^ryiVyVigVyyvyiXH) 



'VyVuA^H) 



A 



H 



«+l-^ 

1 - V 

2^i, 



where (a) follows from [37, Theorem 7, Step 4]. 

Collecting the bounds for the four terms, we have 



< 



< 
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A 


,16" 
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1 


5 • 


80 


^16 



1 

80 



2- 



80 



Step 3: Using the properties of Q, we have Vi„iQ) = Vi^iQ) = XH e A0(C). 
Step 4: We have 



= m\ 



< 



< 



Epni 
q 

q 

i=l ^ 
pni 



{VfSQT^,)-VfSY^-i))\ 



oo,2 



i=3 



oo,2 



We bound the three terms on the right hand side separately. (The reason for doing this is that higher order terms 
in the above sum are easier to bound, so we need to isolate the first two terms in order to get tighter bounds.) The 
new inequalities we derive on the || • ||oo,2 norm, are critical here. 



We bound the first term using Lemma 10 We have: 



l^n.^ro(Qx5)| 



< 



(a) 
< 



q 

q 



< 




Here, inequality (a) is due to Eqs. ( |23| ) and ( (3T| ). In the first equality above, one might be tempted to write 
ll-Po 'P-f- (Qx") II < WP-t- (QiOII and then use the established bound on ||Qtc|| „ from ||37| , but this leads 

II S2i To V^-^o/lloo,2 — II To ^^-^0 ' Hoc, 2 _ ll^-^olloo,2 " ' 

to a looser bound. Instead, we bound the "UqVjc part" and the "Vf^ TZ{-K2) part" of Pf^Vf^ (Q^e) separately. 
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The second term is bounded using Lemma [9] We have: 
pni 



loo, 2 



SI2 



pni 

q 



00,2 



W log(2ni) 



< 



W log(2ni) 



< 



pni 

q 

K 
VP 



log(2ni) 



log(2»i) K 

C7 — - \\Pfn{A2)\\^ 2 



p pVp 



Klog{2ni) filr^ , i^log(2ni) 



PVP 



pni 



cs- 



p^-Jv 



pgr 
ni ' 



Here, (6) is due to Eq. ( [20| ) and ( [22) , and (c) follows from the incoherence assumption and ||37l Proof of Theorem 
7, Step 5]. Again, bounding the "UoVj[a part" and the "VfJZ{K2) part" separately in (a) gives a better bound. 

The third term is of high order and thus easier to control than the first two terms. It suffices to use the loose 
bound |M|^ 2 ^ V^IMloo- We have: 



< 



< 



4=3 



< '^^^pY.\\^^A^rSQ^^.)~'^fSy^^^^^^^ 



1=3 



log(2ni) 



j=3 



i=i ^ ^ 



W log(2ni) 



Cm 



l^ro(Qx5)| 



W log(2ni) ^_ 



< 



C2 



C2- 



u^v^.+Vfn{K2 
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K \og{2ni) 

p'^Vp 



K\og{2ni) Ulr^ , i^log(2ni) 



p2p 



ni 



— Uior 
PWP V "1 



where (a) follows from Eq.(|20|i (here we use the independence between the Cti's again), and (&) is due to the fact 



that cij^ < i 
'-y pp — 2 



K < 1 

pp - 2- 

Collecting the bounds for the three terms, we have 



00.2 
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_ + VPorKlog{2ni) ^ y/jMyrK \og{2ni)\ fjl^ 
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P PVPP PP/ V "1 
'if A' 



C2 



log(2ni) ^ iflog(2ni) ^ Xlog(2ni) \ 

p p^Vp p^Vp J 



Par 
ni 



C3 1 



P Ps/PP / V "1 
^/K \ \[K I Par 



Par , (\og{2ni) K\og{2ni) 
C4 ' 
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Solving, we find 



C3 1 



K 

pVp. 



/p\l ni 



C4 1 



K \ log(2ni) — [JI^ X 
p \ ni 2 



^ A 



^ A > 



C4 1 



C5 



K \ log(2ni) 

pVpJ p 

1 -I- VjI\ ^ 



ni 



> C3 1 



\/~K \ yfK fjMyr 



pVp / \/P V "^1 



as long asl-C6(l + ^^ ^Hli^Iii) 



p A/ (1 ,/) /^o?^ > 0. This is proved in the next section. Notice that when A satisfies 

the above conditions, the inequality (|27|) in Step 1 holds. ■ 



E. Under what conditions is such a A possible? 
In the last section we require a A that satisfies 

1 + ^\ VK 



pVpJ Vp V (1"'') 



(1-7) 



If we know r < f, -f < •y, p > q, then such A exists if 

^1 V^ + ^J wv (1 



< A < ^ 



(32) 
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4^ 



1 

< 



P 



48 V inK 



7 pgr 
1-7) 



C3 I 1 



log(4Tti 



< 1 
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pVp 
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P V (1-7) 
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< 1 - C2 1 



K \ log(4ni) 
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K \ log(4ni) 
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/por - 
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pVpJ p 
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In this case we can choose 



< 



< C5 



C5 



K 

P\/P 



Por 



48 Y 7n/j,o'' log^(4ni) 

Combing the above discussion with Lemma |2j Propositions [T] and |2] and the union bound, concludes the proof 
of our main result. Theorem [T] 

F. When can we eliminate the poly log gap? 

In this section, we prove Theorem |2] which states that the polylog gap can be eliminated under additional 
assumptions on the corrupted columns. The bounds in Theorem |2] are stronger than those in Theorem [T] and the 
proof is simpler. 

In Theorem |2] we assume that the corrupted columns too satisfy the same incoherence condition as the non- 
corrupted columns: 



max \\VuM{ei}\\2 < 

1<1<P 



Por 
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where Um is the left singular vectors of M. Since the nonzero columns of Lq is a subset of the columns of M, 
we have Range('P[/g) C Range('P(7„), and thus Pum decomposed as 



ruM='Puo+'Pu^, 

where Range(7'^i) is a subspace of Rang&{'Pu±), and Uq € MP^'^' 
the incoherence of Pum imphes that of P^± ; that is 



is an orthogonal basis of P^± . Observe that 



max 

l<i<p 



Pu^i^^) 



< max 

2 l<i<p 



{Puo+Pu^){e^) 



max ||P(7M(ei)||2 < 

l<'l<'p 



fJ'pr 
P 



Let us focus on the j th colurmi of C and H; w.l.o.g. we assume Cj ^0. Because C ~ Lq + Cq — L and L e Puo > the 
column space of (7 is a subspace of Range('P[/„). Therefore we can write Cj = Puo{Cj)+Pu±{Cj) = Uox + UqU 
for some column vectors x gW and y G W". It follows that 



^\\U^x\\'^+\\U^y\ 

. Thus we can bound the ith component of Pu^ i^j) 

1 



eju^yl 



< 



(1) 
< 



l + \\udy 

Wuijy] 



eju^\\2\\y\\2 



< 



P 



where in (1) we use ||C^cy||2 
that 



Now we expand ■p^(A2): 

^ro(A2) 



II y II 2 and the Cauchy-Schwarz inequality. Since i and j are arbitrary, it follows 
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nPvoPisPv{Vv'PxsPv)-''Pv'Pu,A>^H) 
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1 + £^ VG'V^ 



VPi.{V^)VoVj. 



Thus, for any (a. b) we have 



|(^f(A2)),, 



< 



eJVuAXH) 



1 + VG'V^ 



< A 



< Aa 



luyr 1 



VV' 



\VV' 



1 + ^ FGV^ 



1 + ^ FGV^ 



I ^11 11^15(1^^)11 \\VoV^eb\ 




p 1 — V V "1 



1 — '0 V ^"^1 



and subsequently 



^fn(A2)| 



< 



pni 



(33) 



With the bound Eq.((33]l on the matrix infinity norm of 7'^^(A2), we can derive tighter bounds in the proof of 
the dual certificate. 

First, we use a dual certificate that is slightly more "economical" than the one we used before. When m — ppni 
satisfies 

m > T]ifilr'^ni^log'^{4ni)/\og{p), 

we may partition fl into s partitions of size q such that s — 5 log(4rii)/ log(p), and q = m/s > ciuiy/pr^ log(4ni). 
Let Q be defined as before except that s and q now have different values. 

With this choice of Q and the new bound ( (33] l, Proposition [2] now becomes 
Proposition 3. Ifni > 32, and A satisfies 



Cl 



log(4ni) / fJ-pr^ 
1-7 



nil - C2 



log(4ni) 



< A < 



1 / 1 



(34) 



for some rj sufficiently small, then Q G 17, and the following holds with high probability 



(a) VfiQ) - UV^ = Vf-R-^iD) for some \\D\\p < \, 

(b) ||7'^^(g)|| < \ 

(c) VxM) e X0iC). 



-X 



(d) 



< 



00,2 



Proof: The proof is based on that of Proposition [2] with only minor modifications; we will point out where 
modifications are needed. We will refer to the proof of Proposition |2] as the "previous proof". 
Similar to the previous proof, we have X^/jfi < ^ and hence 1 ~ > 1 — X^/jn > ^. 

Step 1: Since p > 771/iQr^ log^(4ni)/(y/plog(|))), Eq.pQ]) in the previous proof becomes 



"■lA < (p) 



riiA 



< (p)"3-51og(4ni)/logp^^ 



-| log„(4ni) 
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Step 2 : When the bound ( |33l ) holds, Eq.(|30l) in the previous proof is bounded by 



< 



< 



< 



1 / pni / 



UoV' 



+ \\rfniA^)\ 



1 / pni I / /igr^ ^y/jn / /ipr^ 



pni 1 — -0 y pni 



11 / pni fJ-lr^jn 

— I — A\/^n, 
8 4^' 



where the last inequality holds due to our assumption on A. 
Step 3 : No modification is needed. 
Step 4 : In the previous proof, we write Vx^ iff) 



as three terms and bound them separately. In the 



current case, it turns out that they can be bounded all at once. We have 

= r.lloo,2 



oo,2 
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< 
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pni 
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51og(4ni)/log(p) 
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where (1) is due to Eq.d20|. Solving, we find 



C3log(4ni)/log(p) //i§r2 C3 log(4ni)/ log(p) 



ni 



A^Tn 



ni ~ 2 



^ / _ C4log(4ni)/log(p) tJv^\ ^ C5 log(4ni)/log(p) Llr^ 

\ P \l ni j ~ p 
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C5 log(4rti)/log(p) h^lr^ 
P V 1-7 



/n ( 1 - ^4iog(4rn)/i°g(p) l^^lr^ 
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Algorithm 2 The ALM Algorithm for Robust Matrix Completion 

input: VniM) € W'', C [p] x [n], A (assuming Pn-{M) = 0) 

initiaUze: y(°) = 0; L(o) = 0; C(°) = 0; E^^^i = 0; uq > 0; a > 1; k = 0. 

while not converged do 

([/, S, V) = svd(M - ^C^) - C('=) + n-^yC^)); 
^(fc+i) = [/£^_,(5)yT. 

CC^+i) = (t^J.iM - - lC^+i) + n^iy^'^)); 
^(fc+i) = p^;^M - - C^'^+i) + u^'Y^% 

y{fe+l) = + ^^(j^ _ ^(fc+i) _ ^(fc+i) _ 

k = k + l; 
end while 

return (LC^+i), C^'^+i)) 



In the last proposition, we require A satisfies condition ( [34[ i. Following similar lines in section IV-E such A 
exists if 7 < 7 with 

7 < 

1-7 " ''Vo^^log^(4f^l)/log^(p) 
for some 772 sufficiently small. In this case, we can take 

1 / 1 



This proves Theorem [2] 



A _ . 

4 V 7?^ 



V. Implementation and Simulations 



To facilitate fast and efficient solution, we use a family of algorithms called Augmented Lagrange Multiplier 
(ALM) methods (see e.g., lfT6l ). shown to be effective on problems involving nuclear norm minimization. We have 
adapted this method to our ||-||^ +A|1-|1]^ 2"'-yP^ problem; see Algorithm |2] 

Here £c(5) is the entry-wise soft-thresholding operator: if \Sij\ < e, then set it to zero, otherwise, let 5*^ :— 
Sij — eSij/ \ Sij\. Similarly, £e(C) is the column- wise soft-thresholding operator: if HCiHj < e, then set it to zero, 

otherwise let Ci :— Ci — eCi/ \\Ci\\2- In our experiments, we choose = {\\M\\-^^^ and a = 1.1, and the 
criterion for convergence is 



I\\M\\f < 10 



The first set of experiments demonstrates the power of the manipulator, as we show that even a single ad- 
versarially corrupted column can arbitrarily skew the prediction of standard matrix completion algorithms. In our 
experiments, we fix ?i = p = 400, and 7 = 1/400. For different p and r, we generate the low-rank matrix Lq 
by forming the product Lq = AB^ . The matrices A e W^"^ and B e M"(i~''')^'', have i.i.d. standard Gaussian 
entries. The single corrupted column Co G M^^^ is chosen identical to first column of Lq except for the last entry, 
which is assigned a large value (10 in our experiments). (In Collaborative Filtering this corresponds to a manipulator 
trying to promote the last movie.) The set of observed entries in the uncorrupted columns is chosen uniformly at 
random from all subsets of \p] x [n] of size p x pni. Set M =[ Lq Co ] . Vn{M) and n are then given as input. 
We apply both our algorithm and standard nuclear-norm-based matrix completion. As shown in Figure [T] standard 
matrix completion fails essentially for all values of p and r, while our algorithm is almost unaffected. Here for each 
pair of 7) we run the experiment for 5 times, and plot the frequency of success. Our figures show the number 
of successes by grayscale, where white denotes all success and black denotes all failure. 

Next, we investigate our algorithm's performance under different numbers of corrupted columns, and neutral 
and adversarial corruption. In the first case, each entry of Co is i.i.d. Gaussian. In the second case, the corrupted 
columns are constructed as follows. For 1 <i < jn, corrupted column i copies the observed entries of clean column 
i and fills other entries with i.i.d. Gaussian noise. We fix r = 10 and vary {p, 7). In both cases, each entry in 
Co is observed with probability p independently. Other settings are the same as in the first set of experiments. The 
results for our algorithm and standard matrix completion are shown in the left and right panes of Figure [2] for the 
first corruption scheme, and in Figure [3] for the second corruption scheme. 
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Fig. 1. Experiment results for 400 x 400 matrix witli one corrupted column. We plot the probability of successful recovery of the low 
rank matrix. Panes (a) and (b) show the results of our approach with and without the corrupted column, respectively. Pane (c) shows the 
essentially complete failure of standard matrix completion, due to the corrupted column. 



(a) (b) (c) 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 

fraction of corrupted columns y fraction of corrupted columns y fraction of corrupted columns y 



Fig. 2. Experiment results for 400 x 400 rank-10 matrix with different fraction of observed entries p and fraction of corrupted columns 
7. Corrupted columns are generated neutrally random. Panes (a) and (c) show the results of our approach and standard matrix completion, 
repectively. Pane (b) shows the results of minimizing a convex combination of the nuclear norm and the matrix £1 norm. 



Comparison to Low-rank Plus Sparse. When only a small fraction of the entries are observed, the corrupted 
columns PsilCo) can be viewed as a sparse matrix. Therefore, to separate Lq from ^(^(Co). one might think it is 
possible to apply the techniques in ||2l, Q, which decompose a low -rank matrix and a sparse matrix from their 
sum. In particular, given input VniM), one attempts to decompose it by solving the following convex program: 

mill + A||C||i (35) 

s.t. Vn {L + C)= Vn (M) . 

Our approach specifically deals with corrupted columns, in order to deal with persistent corruption. It is no surprise 
that using the above algorithm instead should not be successful. Indeed, this is the case, and we illustrate this 
numerically in Figures [2] and [3] using the same synthetic data described above. 



VI. Discussion 

In this paper, we provide an efficient algorithm for matrix completion, when some number of the columns 
are arbitrarily corrupted. As our computational results show, ignoring the outliers can have severe consequences, 
with even a single corrupted column jeopardizing the recovery of the low-rank matrix. Similarly, other approaches 
dealing with corruption in matrix completion, in particular those considering matrix-sparse (as opposed to column- 
sparse) corruption, are not able to handle the presence of corrupted columns. Our results give sufficient conditions 
for number of samples needed versus the number of columns corrupted, to enable recovery. To the best of our 
knowledge, these are the first results along these lines. That said, improving these bounds, and also proving lower 
bounds, seems to be an important future direction. 

Our results make no assumptions on the corrupted columns, or on the elements of those columns that are 
revealed. In particular, both the revealed entries and their values can be arbitrarily (potentially maliciously) chosen. 




One arena of application of these results, is the problem of robust collaborative filtering. Our results provide an 
efficient algorithm for efficient collaborative filtering, impervious to the effect of malicious or manipulative users. 
In this paper we have assumed uniform sampling on the authentic columns. Although we do not provide the details 
here, this assumption can be relaxed to other sampling distributions. Finally, we note that the results presented here 
enable the matrix decomposition into a low-rank, sparse, and column spare matrix. 



Appendix 

A. Proof of Lemma^ 

In this section we use {v)i to denote the ith component of a vector v. One observes that by assumption. 




for all i. Therefore, we have 



II^Vo(eOll2 < 



< \\Vv„{e,)\\2\\'Pvo{e^)\\2 



< 



111 



We repeat the lemma below for convenience. 

Lemma. Suppose p < ni and fla G [p] x [tii] is a set of mo entries sampled uniformly at random without 
replacement. For any Z € To and (5 > 1, we have 



pni 
mo 



00,2 >3 V 



with probability at least 1 — (2ni)^ 

Proof: Sample {a,b) uniformly at random without replacement. Define ~ (^eacj ,Vf-^^Z')Vf^{eaeJ) — :^{Z), 
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then the ith column of C is = {caej ,Vf^{Z))Vf^{eael)ei - ^Z^. We have E[^i] = 0, and 



< \\M,Vf^{Z))Vf^{eaeJ)e,\\^ + 
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< 2j^^^ri^\\Z\l, 
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We now compute the second moment. 

= E 
= E 



(36) 
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To proceed, we expand Vf^{eaeJ)ei. Because Vf^ieacJ) = 'Puoiea)eJ + CaVvoi^bV - 'Puoi.'^a)'Pv^{ebY , it 
follows that 

Vc/o(ea)|l2 + |(^y„(e6)),| + |(7'v.„(e6)),||iPc/o(ea)||2, h = i 

Therefore, we obtain 
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(37) 
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where in the last two inequahties we use p <ni. Subsititing back into Eq.((36]l, we obtain 
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Similarly, we have 
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where the last inequaUty is due to Eq.([37]l. Now note that {v^Vno'Pf^^iZ) - ^'P^^(Z)^ = Er=i d''^ where 

s are copies of sampled without replacement. We are ready to use the Noncommutative Bernstein Inequality 
to bound the probability 



mo 



fe=l 



(fc) 



> r 



We now have M = 2^f^^^\\Z\\^^, and = ||Z||L.2. Set 

r=3-2/3^^1og2(2ni)||Z|L,2- 



Since we have 



then Bernstein inequality thus gives 



>^-2/3j^log^(2ni)||^||^,2 
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= (j5+l)exp(-log(2ni)) 



By union bound, we have 
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with probability at least 1 — (2ni) 
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B. Proof of Lemma 10 



We need the following straightforward lemma, whose proof is omitted. 
Lemma 11. For any two matrices A and B, we have{A, B) < \\A\\^ ^ ll-^lloo 2- 

Now we proceed to prove Lemma 10 We repeat the lemma below for convenience. 
Lemma. Suppose f^o G [p] x [ni] is a set of mo entries sampled uniformly at random without replacement. For 
any matrix Z G M"^'', we have 
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with probability at least 1 — (ni + p) " provided mo > ^l3^of{ni + p) log(ni + p). 

Proof: Let Z"^ = TZ {Vi^^ (Z^)). Sample (a, 6) uniformly at random without replacement from [p]x[ni]. Define 
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Note that (^7's=j([/o^^) - ^UqZ^^ Er=i d''^ where ^^''^'s are copies of sampled without replacement. 
We are ready to use the Noncommutative Bernstein Inequality to bound the probability 
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We now have M = 2jn^ 
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provided toq ^ ^P{ni +p)log(ni +p). The Noncommutative Bernstein hiequality thus gives 
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By union bound, we have 
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Therefore, with probability at least 1 — [ni +p)^^^'^. 
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C. Sampling with batch replacement 

In this section we argue that if a dual certificate exists with high probability under the sampling with batch 
replacement (SWBR) model, then the probability that a dual certificate exists under the sampling without replacement 
(SWoR) model with the same number of observations m can only be higher The argument follows the same spirit 
as |li |. Suppose under the sampling with batch replacement model, we sample for a total of m = s x g times and 
obtain a set of m < to distinct entries. 

Recall that in Section IV-D we prove that under the following set of (deterministic) conditions 

. c < 1, 

. ni > 32, 

• A satisfies Eq|24] 
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Q is a dual certificate with high probability under SWBR. Then imder SWoR the probability that a dual certificate 
exists is at least as high. This is because 



SWBR is a dual certificate 

Q is a dual certificate |m 



= X] 'PswBR [m = i] PsWBR 

i=l 

m 

< ^ PswBR [fn = i] PswBR [a dual certificate exists in the space spanned by the m distinct entries \fh 

i=l 
m 

< PswBR [fn = i] PswoR [a dual certificate exists in the space spanned by the m entries ] 

i=l 

= PswoR [a dual certificate exists in the space spanned by the m entries ] 
There is one more subtlety here. When we write 

PswBR Q is a dual certificate |to 

the randomnese comes not only from the locations of the m distinct entries, but also from how these m entries are 
allocated to the s batches in the Golfing Scheme. On the other hand, for 

PswBR [a dual certificate exists in the space spanned by the m distinct entries \fh = i] , 

the only randomnese is the locations of the to entries. However, the following relation still holds: 

PswBR Q is a dual certificate \fh 
< PswBR [a dual certificate exists in the space spanned by the to distinct entries |to = i] 
as can be shown by a straightforward counting argument. 
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